Assessing the Relationship between Train Strike Trespasser Fatalities in California and Population Density

DSAN 6750 / PPOL 6805: GIS for Spatial Data Science

Author
Affiliation

Lindsay Strong

Georgetown University

Other Formats

Introduction

From 2012-2017, there were 3,687 railroad trespasser fatalities across the United States (Kidda et al. 2020). Previous studies have assessed the trends among trespesser strikes and emphasized that trespasser strikes are an urban problem opposed to a rural problem. I will assess the relationship between population density and trespasser strikes using spatial data science techniques.

As California has the most trespasser fatalities out of any US state, I will be limiting my analysis to California (Kidda et al. 2020). I will use trespasser strike data from the Department of Transportation which includes point data for the latitude and longitude of the strike.

By assessing the relationship between trespasser fatalities in California and population density, I hope to apply my findings to targeted interventions to prevent future trespasser strikes.

Literature Review

The Federal Rail Association (FRA) assessed trends in trespasser train strikes from 2012-2017. California, New York, Florida and Texas had the most trespasser strikes across U.S. states (Kidda et al. 2020). The FRA identified trends among suicides, train types, time of day, age, and individual’s action at the time of death (Kidda et al. 2020). This paper does not assess the relationship between population density and trespasser strikes.

Northwestern economics professor, Ian Savage, notes in his manuscript on Trespassing the Railroad in 2007 that trespasser strikes appear to be an urban problem opposed to a city one as “less than one quarter of fatalities occur outside of town or city limits” (Savage 2007).

Methodology

For population density, I used California census tract data from 2020 using tidycensus and divided the total population by the total area for each tract. For the hypothesis testing, I computed the centroid of these census tracts for my underlying density.

Exploratory Data Analysis (EDA)

The map below shows the 1,528 trespasser fatalities (in purple) California from 2011-2022.

Source: Article Notebook

By looking at only census tracts where fatalities have occured, we can see hotspots for strikes in Northern California, specifically around the Richmond and Berkeley area as well as around the Davis area and Modesto area. There appear to be fewer strikes in southern California but there appears to be a cluster around the Pomona and Ontario area.

[1] "22de4cafe1259cf144997981fc5e359e41276773"
Source: Article Notebook

Global Moran’s I

Using Moran’s I for spatial autocorrelation, we can determine if the data is clustered and we can use Local Moran’s I to identify where these clusters lie. I used all of the census tracts across California, even those without strikes, in order to access clustering.

The result from Moran’s I test is displayed below.

A value of roughly 0.16 indicates a slight positive autocorrelation so we can conclude that nearby census tracts have slightly similar numbers of strikes.

Local Moran’s I

Local Moran’s I test identifies areas where strikes are clustered together. The results of Local Moran’s I test can be shown in the map below.

Source: Article Notebook

Hypothesis Testing

I will conduct hypothesis testing using 999 monte carlo simulations. My hypotheses are as follows:

Null Hypothesis: The number of trespasser fatalities in census tracts is directly proportional to its population density.

Alternative Hypothesis: The number of trespasser fatalities in census tracts is greater than or less than what would be expected by population density alone.

This is a first-order hypothesis as we are accessing a direct relationship between fatalities and population density. The underlying density is shown below and is computed using the population densities of the centroids of each census tract with fatalities. The high density areas in San Francisco and Los Angeles are shown in yellow on the density plot.

We will divide the population density into three population density areas (high density, medium density, and low density) in order to run hypothesis testing. The density plot below shows that there is higher population density closer to San Francisco and Los Angeles with lower population density closer on the eastern side of California.

In order to access our hypothesis, we must compute the intensity function for trespasser fatalities. The intensity function is shown below. The density plot highlights the concentration of strikes in Northern California around San Francisco.

Using the fatality intensity function and the population density areas, we can compute the number of fatalities in high density, medium density, and low density areas. These values are shown below.

   Low Medium   High 
    11    189   1321 
Source: Article Notebook

After running 999 monte carlo simulations, we can show the distribution of the fatalities in each of the density areas compared to the observed number of fatalities in those areas.

High Density Areas

There were no simulations with as many fatalities in high density areas as our observed data. This is shown by the output of 0.001 below, where the result is the number of rows that have greater than or equal to the observed value dividing by the rows in the simulated data plus one row of the observed data.

We can conclude that the number of trespasser fatalities in high density areas is significantly greater than what would be expected by population density alone.

Medium Density Areas

There were no simulations with as few fatalities in medium density areas as our observed data. This is shown by the output of 0.001 below, where the result is the number of rows that have less than or equal to the observed value dividing by the rows in the simulated data plus one row of the observed data.

We can conclude that the number of trespasser fatalities in medium density areas is significantly less than what would be expected by population density alone.

Low Density Areas

Similar to medium density areas, there were no simulations with as few fatalities in low density areas as our observed data. This is shown by the output of 0.001 below, where the result is the number of rows that have less than or equal to the observed value dividing by the rows in the simulated data plus one row of the observed data.

We can conclude that the number of trespasser fatalities in low density areas is significantly less than what would be expected by population density alone.

We can reject the null hypothesis and conclude that the number of trespasser fatalities in census tracts is greater than or less than what would be expected by population density alone. Further, trespasser fatalities are higher than what would be observed by population density and trespasser fatalities in medium and low density areas are lower than what would be expected by population density.

Discussion

The results of our hypothesis test indicate that the number of trespasser fatalities in census tracts are not soley dependent on the population density alone. By rejecting the null hypothesis, we can conclude that there are other contributing factors to trespasser fatalities in an area in addition to the number of people.

High density areas experience significantly higher rates of fatalities than what would be expected on population density alone. This suggests that other factors in areas in addition to population density could be at play. For example, homelessness tends to exist primarily in urban areas and homeless individuals often make encapments next to the railroad. This could increase the likilihood of being hit by the train.

Medium and low density areas experience a significantly lower amount of strikes than would be expected based on population density alone. This could be due to less frequent train service, fewer homeless people, or less pedestrians near the train tracks.

Conclusion

Trespasser fatalities in high density areas are significantly higher than what would be expected by population density alone. In contrast, trespasser fatalities in medium and low density areas significantly lower than what would be expected by population density alone. Policy makers and safety officials should focus on improving safety efforts in areas with high population density. Specific cities that should be investigated are Richmond, Berkeley, Davis and Modesto. Further research could investigate other factors that could contribute to trespasser strikes in addition to population density such as homelessness, pedestrian crossings, or crime rates.

References

Kidda, Starr, Stephanie G Chase, Danielle Hiltunen, et al. 2020. “Fatal Trespasser Strikes in the United States: 2012-2017 [Research Results].” United States. Department of Transportation. Federal Railroad Administration.
Savage, Ian. 2007. “Trespassing on the Railroad.” Research in Transportation Economics 20: 199–224.